SMS Spam Filtering Using Probabilistic Topic Modelling and Stacked Denoising Autoencoder

نویسندگان

  • Noura Al Moubayed
  • Toby P. Breckon
  • Peter Matthews
  • A. Stephen McGough
چکیده

In This paper we present a novel approach to spam filtering and demonstrate its applicability with respect to SMS messages. Our approach requires minimum features engineering and a small set of labelled data samples. Features are extracted using topic modelling based on latent Dirichlet allocation, and then a comprehensive data model is created using a Stacked Denoising Autoencoder (SDA). Topic modelling summarises the data providing ease of use and high interpretability by visualising the topics using word clouds. Given that the SMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDA is able to model the messages and accurately discriminate between the two classes without the need for a pre-labelled training set. The results are compared against the state-of-the-art spam detection algorithms with our proposed approach achieving over 97% accuracy which compares favourably to the best reported algorithms presented in the literature.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Real-time Dynamic MRI Reconstruction using Stacked Denoising Autoencoder

In this work we address the problem of real-time dynamic MRI reconstruction. There are a handful of studies on this topic; these techniques are either based on compressed sensing or employ Kalman Filtering. These techniques cannot achieve the reconstruction speed necessary for real-time reconstruction. In this work, we propose a new approach to MRI reconstruction. We learn a non-linear mapping ...

متن کامل

Decoding Stacked Denoising Autoencoders

Data representation in a stacked denoising autoencoder is investigated. Decoding is a simple technique for translating a stacked denoising autoencoder into a composition of denoising autoencoders in the ground space. In the infinitesimal limit, a composition of denoising autoencoders is reduced to a continuous denoising autoencoder, which is rich in analytic properties and geometric interpretat...

متن کامل

PHD: A Probabilistic Model of Hybrid Deep Collaborative Filtering for Recommender Systems

Collaborative Filtering (CF), a well-known approach in producing recommender systems, has achieved wide use and excellent performance not only in research but also in industry. However, problems related to cold start and data sparsity have caused CF to attract an increasing amount of attention in efforts to solve these problems. Traditional approaches adopt side information to extract effective...

متن کامل

SMS spam filtering: Methods and data

Mobile or SMS spam is a real and growing problem primarily due to the availability of very cheap bulk pre-pay SMS packages and the fact that SMS engenders higher response rates as it is a trusted and personal service. SMS spam filtering is a relatively new task which inherits many issues and solutions from email spam filtering. However it poses its own specific challenges. This paper motivates ...

متن کامل

Marginalized Stacked Denoising Autoencoders

Stacked Denoising Autoencoders (SDAs) [4] have been used successfully in many learning scenarios and application domains. In short, denoising autoencoders (DAs) train one-layer neural networks to reconstruct input data from partial random corruption. The denoisers are then stacked into deep learning architectures where the weights are fine-tuned with back-propagation. Alternatively, the outputs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016